Skip to content

[fix](fe) Prevent cast project pushdown through union distinct#64080

Open
morrySnow wants to merge 2 commits into
apache:masterfrom
morrySnow:fix-push-project-union
Open

[fix](fe) Prevent cast project pushdown through union distinct#64080
morrySnow wants to merge 2 commits into
apache:masterfrom
morrySnow:fix-push-project-union

Conversation

@morrySnow

@morrySnow morrySnow commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: PushProjectThroughUnion previously pushed cast projections below UNION DISTINCT without checking whether the cast preserved distinctness. For non-injective casts, distinct was then evaluated after the cast instead of before the cast, which could incorrectly collapse rows that were distinct in the original union output. This PR adds DataType.canSafetyCastTo to model casts that are safe for this rewrite, uses it to allow cast pushdown through UNION DISTINCT only when the cast is distinctness-preserving, and keeps UNION ALL pushdown unchanged. The implementation also covers primitive and complex types, including length-insensitive string-like casts, and fixes edge cases found during review for integral, map, and struct types.

Release note

None

Check List (For Author)

  • Test: Regression test / Unit Test
    • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushProjectThroughUnionTest
    • ./run-fe-ut.sh --run org.apache.doris.nereids.types.DataTypeTest
    • ./run-fe-ut.sh --run org.apache.doris.nereids.types.DataTypeTest,org.apache.doris.nereids.rules.rewrite.PushProjectThroughUnionTest
    • ./run-regression-test.sh --run -d nereids_syntax_p0 -s set_operation
    • mvn checkstyle:check -pl fe-core
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow morrySnow marked this pull request as ready for review June 3, 2026 10:56
@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The PR prevents non-injective cast projections from being pushed below UNION DISTINCT, preserving distinctness before the outer cast. The added unit guard and regression query cover the reported datetime-to-date collision scenario.
  • Scope and clarity: The code change is small and focused on the existing canPushProject gate. It does not introduce a new rule or alter rule registration.
  • Concurrency and lifecycle: Not applicable; this is a deterministic optimizer rewrite predicate and test update with no shared mutable state, locks, threads, static initialization, or lifecycle-sensitive resources.
  • Configuration and compatibility: No new configs, persistence formats, FE-BE protocol fields, or storage compatibility concerns.
  • Parallel paths: The rewrite is attached to LogicalSetOperation, but SQL analysis rejects INTERSECT ALL and EXCEPT ALL, and the guarded cast path remains appropriate for UNION ALL while DISTINCT set operations now reject non-slot projections.
  • Conditions and error handling: The new qualifier condition is explicit and conservative. No Status/exception handling paths are changed.
  • Tests and expected results: The unit test checks ALL vs DISTINCT behavior, and the regression test validates the end-to-end duplicate-preserving result for UNION DISTINCT followed by a lossy cast. The output is deterministic via order_qt.
  • Observability, transactions, memory: Not applicable for this optimizer-only change.
  • Performance: The change may skip some previously allowed harmless alias/cast pushdowns through DISTINCT, but this is an acceptable conservative tradeoff for correctness.

User focus: No additional user-provided review focus was supplied.

Verification performed: git diff --check passed. I attempted ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushProjectThroughUnionTest, but this runner is missing thirdparty/installed/bin/protoc, so generated-source setup failed before tests could run.

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29308 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f9d17b5358d804108f4fcef763d9dd5be1740aad, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17591	4100	4037	4037
q2	q3	10784	1409	829	829
q4	4693	473	345	345
q5	7556	892	580	580
q6	186	181	139	139
q7	780	844	659	659
q8	9363	1729	1624	1624
q9	5889	4491	4451	4451
q10	6761	1817	1564	1564
q11	433	270	249	249
q12	634	437	287	287
q13	18141	3384	2788	2788
q14	264	259	246	246
q15	q16	793	772	705	705
q17	1000	955	915	915
q18	7176	5712	5468	5468
q19	1248	1255	1103	1103
q20	506	405	260	260
q21	6007	2780	2731	2731
q22	482	387	328	328
Total cold run time: 100287 ms
Total hot run time: 29308 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5208	4921	4876	4876
q2	q3	4922	5304	4755	4755
q4	2188	2243	1411	1411
q5	4892	4962	4779	4779
q6	238	187	138	138
q7	1853	1817	1598	1598
q8	2581	2261	2196	2196
q9	8053	7801	7464	7464
q10	4733	4671	4261	4261
q11	537	411	377	377
q12	738	734	522	522
q13	3030	3541	2766	2766
q14	285	279	247	247
q15	q16	686	716	614	614
q17	1321	1278	1278	1278
q18	7274	6951	6976	6951
q19	1125	1110	1106	1106
q20	2228	2209	1972	1972
q21	5424	4721	4895	4721
q22	536	474	428	428
Total cold run time: 57852 ms
Total hot run time: 52460 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 170262 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f9d17b5358d804108f4fcef763d9dd5be1740aad, data reload: false

query5	4313	652	503	503
query6	456	198	181	181
query7	4849	572	289	289
query8	375	226	209	209
query9	8738	4064	4068	4064
query10	458	323	266	266
query11	5956	2360	2152	2152
query12	155	104	99	99
query13	1256	630	403	403
query14	6312	5397	5074	5074
query14_1	4382	4372	4344	4344
query15	210	198	176	176
query16	1011	462	428	428
query17	1035	687	550	550
query18	2443	463	359	359
query19	200	178	140	140
query20	125	112	107	107
query21	214	136	117	117
query22	13624	13550	13391	13391
query23	17461	16602	16158	16158
query23_1	16229	16349	16408	16349
query24	7593	1786	1318	1318
query24_1	1304	1325	1295	1295
query25	596	475	427	427
query26	1289	328	186	186
query27	2678	585	366	366
query28	4465	2024	2040	2024
query29	1137	626	510	510
query30	326	243	203	203
query31	1126	1093	967	967
query32	116	66	61	61
query33	547	336	260	260
query34	1187	1159	662	662
query35	765	808	724	724
query36	1409	1398	1266	1266
query37	158	110	92	92
query38	3189	3155	3027	3027
query39	944	906	901	901
query39_1	884	875	903	875
query40	224	128	106	106
query41	75	68	68	68
query42	100	97	95	95
query43	318	322	276	276
query44	
query45	198	201	181	181
query46	1073	1224	747	747
query47	2338	2395	2286	2286
query48	433	430	301	301
query49	642	485	369	369
query50	1049	354	262	262
query51	4339	4290	4285	4285
query52	91	92	80	80
query53	248	271	189	189
query54	284	234	221	221
query55	82	80	74	74
query56	251	244	235	235
query57	1433	1401	1318	1318
query58	259	226	219	219
query59	1611	1705	1432	1432
query60	279	274	240	240
query61	159	164	162	162
query62	703	647	588	588
query63	232	187	188	187
query64	2551	802	636	636
query65	
query66	1813	471	345	345
query67	29751	29871	29461	29461
query68	
query69	422	306	277	277
query70	988	951	984	951
query71	307	221	213	213
query72	2966	2711	2207	2207
query73	846	752	440	440
query74	5138	4927	4785	4785
query75	2659	2593	2262	2262
query76	2307	1155	766	766
query77	359	372	305	305
query78	12394	12502	11852	11852
query79	1451	996	746	746
query80	585	481	394	394
query81	448	281	247	247
query82	588	166	125	125
query83	369	285	251	251
query84	291	141	110	110
query85	926	558	443	443
query86	377	319	288	288
query87	3407	3326	3183	3183
query88	3646	2748	2720	2720
query89	417	388	330	330
query90	1957	180	198	180
query91	178	181	138	138
query92	65	60	59	59
query93	1460	1414	829	829
query94	526	358	325	325
query95	697	459	354	354
query96	1076	785	336	336
query97	2670	2687	2550	2550
query98	210	209	234	209
query99	1181	1181	1077	1077
Total cold run time: 251355 ms
Total hot run time: 170262 ms

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow morrySnow force-pushed the fix-push-project-union branch from 1cc69e1 to 0b76718 Compare June 4, 2026 11:13
@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one correctness issue that can still make PushProjectThroughUnion change UNION DISTINCT results.

Checkpoint conclusions:

  • Goal/test: The PR addresses the intended datetime-to-date regression and adds unit/regression coverage, but coverage misses complex-to-character casts with non-injective nested formatting.
  • Scope/focus: The rewrite change is focused, but the new safe-cast classification is broader than the proof needed by the rule.
  • Concurrency/lifecycle/config/compatibility/persistence/writes: Not applicable; this is an optimizer rewrite/type-classification change.
  • Parallel paths: UNION ALL remains unchanged; UNION DISTINCT now uses the new classification, but complex-to-character paths need the same injectivity standard as primitive casts.
  • Test result review: The added regression output is deterministic and matches the intended case. Additional negative coverage is needed for complex values containing float/double before this is safe.
  • Observability/performance: Not applicable beyond normal rewrite behavior.
  • User focus: No additional user-provided focus points were supplied.

Please fix the complex-to-character safe-cast classification before merging.

public boolean canSafetyCastTo(DataType target) {
if (target instanceof ArrayType) {
return itemType.canSafetyCastTo(((ArrayType) target).itemType);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canSafetyCastTo is now used to decide whether UNION DISTINCT duplicate elimination may be moved after the cast, so true here must mean the cast is injective. Complex-to-character casts are not always injective: for arrays/maps/structs containing FLOAT or DOUBLE, direct float/double-to-string is intentionally not marked safe, but complex stringification delegates to the nested serde (for arrays, DataTypeArraySerDe::to_string calls nested_serde->to_string) and BE formats float/double with only digits10 + 1 significant digits in cast_to_string.h. Two distinct arrays whose float elements differ only beyond that formatted precision can therefore stringify to the same value; after this rewrite the pushed cast would make UNION DISTINCT collapse a row that the original plan kept. The same unconditional target instanceof CharacterType pattern in MapType and StructType has the same issue. Please either leave complex-to-character casts unsafe or require the nested types themselves to be injective when cast to the character target.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the new safe-cast model. The PR goal is clear: allow project/cast pushdown through UNION DISTINCT only when the cast preserves distinctness. The implementation mostly follows the existing rule shape and has focused FE unit/regression coverage for primitive datetime/integer cases, but the complex-to-string cases currently marked safe are not distinctness-preserving and can change query results.

Critical checkpoint conclusions:

  • Goal/test: The goal is partially achieved for tested primitive cases, but complex-to-string safe-cast coverage violates the core UNION DISTINCT invariant; existing tests assert these cases safe instead of covering a collision.
  • Scope/clarity: The change is focused, but the new canSafetyCastTo contract needs stricter evidence before broad type-family allowances.
  • Concurrency/lifecycle/config/compatibility/observability/transactions/data writes: Not applicable; this is an optimizer rewrite/type-semantics change.
  • Parallel paths: The rule is used both directly and via LogicalSetOperation project processing; the unsafe predicate affects both paths.
  • Test coverage/results: Added tests cover the original datetime regression and some type predicates, but miss non-injective complex stringification.
  • Performance: No new performance issue found.

User focus: No additional user-provided review focus was specified.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29503 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0b76718bdd2b2371fc9eded250f9aeaff26aaa4f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17633	4044	4085	4044
q2	q3	10744	1385	814	814
q4	4686	478	349	349
q5	7540	885	588	588
q6	189	178	140	140
q7	788	855	631	631
q8	9433	1646	1621	1621
q9	5999	4549	4546	4546
q10	6749	1800	1551	1551
q11	435	274	263	263
q12	635	432	290	290
q13	18108	3335	2812	2812
q14	270	262	245	245
q15	q16	824	780	713	713
q17	999	905	935	905
q18	6814	5770	5679	5679
q19	1291	1219	1118	1118
q20	519	408	262	262
q21	6272	2874	2621	2621
q22	468	384	311	311
Total cold run time: 100396 ms
Total hot run time: 29503 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5203	4743	4970	4743
q2	q3	4944	5218	4630	4630
q4	2119	2176	1403	1403
q5	4864	4893	4738	4738
q6	230	175	126	126
q7	1865	1760	1584	1584
q8	2408	2135	2144	2135
q9	7909	7851	7354	7354
q10	4742	4681	4199	4199
q11	557	387	349	349
q12	727	738	531	531
q13	3000	3359	2779	2779
q14	274	277	267	267
q15	q16	673	690	635	635
q17	1287	1255	1264	1255
q18	7294	6973	6823	6823
q19	1112	1111	1106	1106
q20	2208	2211	1939	1939
q21	5272	4537	4447	4447
q22	513	460	409	409
Total cold run time: 57201 ms
Total hot run time: 51452 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169465 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0b76718bdd2b2371fc9eded250f9aeaff26aaa4f, data reload: false

query5	4327	646	490	490
query6	436	204	179	179
query7	4809	558	284	284
query8	360	218	213	213
query9	8745	3989	3981	3981
query10	463	319	255	255
query11	5894	2346	2132	2132
query12	150	101	101	101
query13	1260	629	446	446
query14	6404	5380	5077	5077
query14_1	4385	4396	4419	4396
query15	215	199	175	175
query16	1046	455	441	441
query17	1153	689	564	564
query18	2565	472	335	335
query19	193	176	136	136
query20	113	106	105	105
query21	224	132	125	125
query22	13623	13552	13377	13377
query23	17304	16604	16210	16210
query23_1	16229	16408	16246	16246
query24	7768	1760	1302	1302
query24_1	1312	1325	1334	1325
query25	557	477	372	372
query26	1284	321	169	169
query27	2691	561	346	346
query28	4481	2045	2016	2016
query29	1085	595	468	468
query30	312	235	199	199
query31	1103	1083	937	937
query32	107	66	58	58
query33	544	329	243	243
query34	1161	1151	661	661
query35	777	778	677	677
query36	1417	1423	1257	1257
query37	152	102	91	91
query38	3207	3135	3036	3036
query39	940	923	908	908
query39_1	876	867	874	867
query40	215	125	102	102
query41	65	61	63	61
query42	93	90	92	90
query43	315	321	274	274
query44	
query45	198	183	181	181
query46	1100	1264	733	733
query47	2432	2331	2318	2318
query48	404	421	303	303
query49	637	481	348	348
query50	1032	352	249	249
query51	4348	4262	4273	4262
query52	87	88	77	77
query53	245	263	186	186
query54	263	227	199	199
query55	80	75	69	69
query56	248	227	222	222
query57	1442	1411	1325	1325
query58	242	215	207	207
query59	1577	1630	1418	1418
query60	272	245	238	238
query61	161	159	157	157
query62	717	656	590	590
query63	233	189	184	184
query64	2535	781	654	654
query65	
query66	1771	466	344	344
query67	29622	29569	29474	29474
query68	
query69	415	298	268	268
query70	975	954	915	915
query71	309	223	210	210
query72	3003	2485	2375	2375
query73	877	803	415	415
query74	5167	4940	4760	4760
query75	2651	2570	2230	2230
query76	2323	1136	764	764
query77	369	380	293	293
query78	12374	12381	11827	11827
query79	1248	1008	782	782
query80	563	516	419	419
query81	452	286	248	248
query82	248	160	121	121
query83	278	280	261	261
query84	270	150	115	115
query85	908	609	527	527
query86	339	302	289	289
query87	3360	3336	3201	3201
query88	3639	2761	2750	2750
query89	417	382	331	331
query90	2194	185	177	177
query91	187	186	151	151
query92	62	61	61	61
query93	1371	1454	851	851
query94	545	365	307	307
query95	733	375	445	375
query96	1115	780	341	341
query97	2689	2705	2545	2545
query98	209	206	205	205
query99	1179	1176	1026	1026
Total cold run time: 250867 ms
Total hot run time: 169465 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 17.14% (12/70) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 52.86% (37/70) 🎉
Increment coverage report
Complete coverage report

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: PushProjectThroughUnion allowed cast projections to be pushed below UNION DISTINCT. For non-injective casts, distinct was then evaluated after the cast instead of before the cast, which could incorrectly collapse rows that were distinct in the original union output. Restrict cast project pushdown to UNION ALL and add tests for the qualifier guard and the DORIS-26135 datetime-to-date regression case.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PushProjectThroughUnionTest
    - ./run-regression-test.sh --run -d nereids_syntax_p0 -s set_operation
    - mvn checkstyle:check -pl fe-core
- Behavior changed: No
- Does this need documentation: No
@morrySnow morrySnow force-pushed the fix-push-project-union branch from 0b76718 to b3c474c Compare June 8, 2026 15:23
@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one remaining data-correctness issue in the new injective-cast predicate. The main rule change itself is focused and the datetime-to-date UNION DISTINCT regression covers one important unsafe cast, but JSONB-to-character is still marked safe even though the BE cast collapses distinct JSON values.

Critical checkpoint conclusions:

  • Goal/test: The goal is to prevent pushing non-injective casts below UNION DISTINCT; tests cover datetime-to-date and several predicate cases, but not JSONB-to-string.
  • Scope: The implementation is small and localized to the rewrite predicate plus type methods.
  • Concurrency/lifecycle/config/incompatibility: No concurrency, lifecycle, config, persistence, storage format, or FE-BE protocol compatibility concerns found in this PR.
  • Parallel paths: UNION ALL remains intentionally permissive; UNION DISTINCT now uses the new predicate, but the predicate still has an unsafe JSONB character path.
  • Test coverage/results: Added unit/regression tests are relevant, but missing coverage for JSONB string-vs-number collisions. Existing review threads already cover complex-to-character serialization issues, so I did not duplicate them.
  • Observability/performance: No new observability requirement; no material performance concern found in the reviewed path.

User focus: No additional user-provided review focus was specified.

Comment thread fe/fe-core/src/main/java/org/apache/doris/nereids/types/JsonType.java Outdated
@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 52.86% (37/70) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 17.14% (12/70) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29672 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b3c474ca2f33194267346d7bf418d79823f9b199, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17624	4031	3989	3989
q2	q3	10809	1424	832	832
q4	4690	477	341	341
q5	7549	874	612	612
q6	189	177	138	138
q7	793	841	630	630
q8	9342	1695	1685	1685
q9	5800	4550	4556	4550
q10	6774	1802	1560	1560
q11	441	299	259	259
q12	629	451	301	301
q13	20720	3533	2836	2836
q14	278	267	244	244
q15	q16	839	791	755	755
q17	1024	1040	923	923
q18	7070	5935	5674	5674
q19	1386	1210	1080	1080
q20	523	595	306	306
q21	6079	2853	2641	2641
q22	457	376	316	316
Total cold run time: 103016 ms
Total hot run time: 29672 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5010	4730	4810	4730
q2	q3	4974	5334	4682	4682
q4	2100	2238	1393	1393
q5	5063	4726	4703	4703
q6	235	179	127	127
q7	2043	1750	1562	1562
q8	2426	2169	2185	2169
q9	7827	7492	7486	7486
q10	4794	4684	4272	4272
q11	549	429	382	382
q12	738	745	535	535
q13	3052	3451	2765	2765
q14	281	281	249	249
q15	q16	695	705	614	614
q17	1306	1272	1273	1272
q18	7194	6868	6858	6858
q19	1129	1084	1127	1084
q20	2234	2224	1972	1972
q21	5369	4680	4537	4537
q22	538	473	428	428
Total cold run time: 57557 ms
Total hot run time: 51820 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 170047 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b3c474ca2f33194267346d7bf418d79823f9b199, data reload: false

query5	4322	629	514	514
query6	449	199	182	182
query7	4854	562	304	304
query8	372	231	205	205
query9	8778	4110	4103	4103
query10	443	310	271	271
query11	5973	2384	2217	2217
query12	162	107	99	99
query13	1246	611	441	441
query14	6630	5500	5150	5150
query14_1	4493	4507	4426	4426
query15	214	203	181	181
query16	1026	424	460	424
query17	1110	701	585	585
query18	2735	484	342	342
query19	212	180	143	143
query20	119	118	108	108
query21	213	144	117	117
query22	13821	13698	13519	13519
query23	17411	16667	16216	16216
query23_1	16352	16281	16375	16281
query24	7523	1798	1327	1327
query24_1	1326	1327	1350	1327
query25	587	442	400	400
query26	1286	315	173	173
query27	2620	551	335	335
query28	4409	2031	2017	2017
query29	1097	646	490	490
query30	321	241	203	203
query31	1124	1073	948	948
query32	137	63	59	59
query33	549	320	269	269
query34	1215	1165	663	663
query35	759	784	700	700
query36	1387	1389	1229	1229
query37	166	107	93	93
query38	3239	3189	3101	3101
query39	954	932	912	912
query39_1	900	873	861	861
query40	235	131	109	109
query41	73	72	69	69
query42	99	100	99	99
query43	332	338	297	297
query44	
query45	203	201	199	199
query46	1110	1288	734	734
query47	2323	2309	2257	2257
query48	405	417	304	304
query49	656	523	349	349
query50	1040	358	255	255
query51	4375	4324	4262	4262
query52	89	90	78	78
query53	246	276	191	191
query54	286	223	198	198
query55	81	76	70	70
query56	259	221	212	212
query57	1450	1398	1297	1297
query58	248	214	214	214
query59	1614	1754	1466	1466
query60	282	252	236	236
query61	157	154	158	154
query62	702	654	586	586
query63	240	189	190	189
query64	2515	814	623	623
query65	
query66	1742	470	344	344
query67	29696	29727	29512	29512
query68	
query69	430	300	271	271
query70	1018	1002	943	943
query71	306	230	213	213
query72	3107	2986	2476	2476
query73	874	769	446	446
query74	5162	5003	4800	4800
query75	2709	2574	2257	2257
query76	2371	1169	780	780
query77	377	395	297	297
query78	12361	12430	11897	11897
query79	1479	1092	779	779
query80	610	470	406	406
query81	465	283	250	250
query82	923	165	119	119
query83	367	285	256	256
query84	
query85	912	549	452	452
query86	418	302	284	284
query87	3463	3365	3183	3183
query88	3691	2766	2749	2749
query89	425	402	334	334
query90	1976	181	191	181
query91	181	167	140	140
query92	62	63	62	62
query93	1475	1478	902	902
query94	553	336	310	310
query95	690	402	448	402
query96	1006	816	368	368
query97	2700	2705	2556	2556
query98	218	224	200	200
query99	1187	1192	1032	1032
Total cold run time: 252870 ms
Total hot run time: 170047 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 7.19% (12/167) 🎉
Increment coverage report
Complete coverage report

englefly
englefly previously approved these changes Jun 10, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 10, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: PushProjectThroughUnion can safely push casts through UNION DISTINCT only when the cast preserves distinctness. Add DataType.canSafetyCastTo coverage for primitive and complex types, fix unsafe or incomplete edge cases found during review, treat string-like casts as length-insensitive, and add unit tests for the new interface and pushdown qualifier behavior.

None

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.types.DataTypeTest,org.apache.doris.nereids.rules.rewrite.PushProjectThroughUnionTest
    - ./run-fe-ut.sh --run org.apache.doris.nereids.types.DataTypeTest
    - mvn checkstyle:check -pl fe-core
- Behavior changed: No
- Does this need documentation: No
@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Jun 10, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one new blocking correctness issue in the DISTINCT set-operation guard.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to prevent project pushdown from changing set-operation distinctness. The new tests cover a single-column non-injective cast through UNION DISTINCT, but they do not cover projections that duplicate/drop set-op output slots; that path can still change results.
  • Scope/focus: The implementation is small and focused, but the safety check is only per expression and is not sufficient for the whole projection tuple.
  • Concurrency/lifecycle/config/compatibility/transactions/FE-BE protocol/observability: Not applicable for this FE optimizer metadata/rewrite change.
  • Parallel paths: The rule is attached to LogicalSetOperation. INTERSECT/EXCEPT ALL are rejected during binding, so the reachable ALL fast path is UNION ALL; DISTINCT set ops rely on the same incomplete projection guard.
  • Tests/results: Existing new regression output is deterministic, but coverage is missing for duplicate-slot/permutation cases under DISTINCT.
  • Performance: No new performance issue found.
  • User focus: No additional user-provided review focus was supplied.

I did not repeat the already-open inline threads about JSONB/complex-to-character cast injectivity.

Expression expr;
if (logicalSetOperation.getQualifier().equals(Qualifier.ALL)) {
expr = ExpressionUtils.getExpressionCoveredByCast(e.child(0));
} else {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still lets UNION DISTINCT run after a projection that is not injective as a tuple. The check proves each expression is a slot or an injective cast of one slot, but it never verifies that the referenced slots form a permutation of the set-op output. For example:

select cast(a as bigint), cast(a as bigint)
from (select 1 a, 2 b union select 1, 3) t;

Before the rewrite, the inner union keeps two distinct (a,b) rows and the outer project returns two (1,1) rows. After this rule pushes the project into both children, the union distinct sees only (1,1) and collapses them to one row. For non-UNION ALL set ops, please require the stripped slots to cover every output slot exactly once (or otherwise prove the whole projection vector is injective), not just each expression independently.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29594 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 06d2fea080e873ea81629116944125d44366e55e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17779	4112	4109	4109
q2	q3	10756	1334	791	791
q4	4686	480	356	356
q5	7536	860	586	586
q6	186	175	139	139
q7	797	845	650	650
q8	9775	1696	1686	1686
q9	7029	4564	4510	4510
q10	6871	1808	1529	1529
q11	432	272	249	249
q12	629	423	302	302
q13	18125	3480	2765	2765
q14	267	261	243	243
q15	q16	824	774	706	706
q17	954	932	957	932
q18	6978	5795	5703	5703
q19	1301	1299	1067	1067
q20	531	418	263	263
q21	6152	2715	2682	2682
q22	457	372	326	326
Total cold run time: 102065 ms
Total hot run time: 29594 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5163	4829	4855	4829
q2	q3	5121	5224	4636	4636
q4	2135	2205	1422	1422
q5	4849	4830	4678	4678
q6	244	187	134	134
q7	1894	1759	1564	1564
q8	2531	2131	1926	1926
q9	7542	7492	7431	7431
q10	4777	4724	4239	4239
q11	536	389	355	355
q12	726	735	535	535
q13	3030	3444	2829	2829
q14	275	284	266	266
q15	q16	681	708	615	615
q17	1298	1277	1277	1277
q18	7540	6955	6808	6808
q19	1127	1133	1126	1126
q20	2224	2253	1973	1973
q21	5349	4639	4478	4478
q22	554	454	407	407
Total cold run time: 57596 ms
Total hot run time: 51528 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169669 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 06d2fea080e873ea81629116944125d44366e55e, data reload: false

query5	4320	660	479	479
query6	467	200	186	186
query7	4871	565	308	308
query8	361	219	198	198
query9	8754	4053	4025	4025
query10	474	314	264	264
query11	5941	2357	2196	2196
query12	164	106	102	102
query13	1282	636	421	421
query14	6408	5411	5130	5130
query14_1	4416	4459	4426	4426
query15	208	207	182	182
query16	1020	449	445	445
query17	1147	720	589	589
query18	2616	495	350	350
query19	206	192	148	148
query20	121	116	110	110
query21	219	145	117	117
query22	13686	13632	13425	13425
query23	17558	16566	16193	16193
query23_1	16263	16336	16533	16336
query24	7585	1807	1310	1310
query24_1	1337	1325	1290	1290
query25	595	491	455	455
query26	1297	333	170	170
query27	2719	576	342	342
query28	4456	2031	1977	1977
query29	1043	622	474	474
query30	299	239	198	198
query31	1135	1080	956	956
query32	101	61	62	61
query33	500	316	251	251
query34	1195	1134	643	643
query35	768	782	705	705
query36	1363	1367	1256	1256
query37	161	108	88	88
query38	3228	3189	3034	3034
query39	981	915	898	898
query39_1	871	871	892	871
query40	222	123	104	104
query41	64	62	64	62
query42	97	98	96	96
query43	326	326	294	294
query44	
query45	197	188	184	184
query46	1071	1187	740	740
query47	2358	2359	2186	2186
query48	407	412	289	289
query49	628	470	360	360
query50	1014	347	254	254
query51	4439	4301	4261	4261
query52	87	88	77	77
query53	249	275	193	193
query54	266	225	193	193
query55	80	75	68	68
query56	230	228	214	214
query57	1398	1405	1276	1276
query58	258	214	220	214
query59	1594	1705	1420	1420
query60	298	256	236	236
query61	158	157	155	155
query62	719	648	590	590
query63	226	187	188	187
query64	2567	788	632	632
query65	
query66	1770	465	379	379
query67	29931	29795	29953	29795
query68	
query69	444	326	272	272
query70	1037	990	958	958
query71	297	231	211	211
query72	3011	2787	2412	2412
query73	890	813	436	436
query74	5221	4990	4795	4795
query75	2683	2617	2245	2245
query76	2359	1136	783	783
query77	360	382	298	298
query78	12408	12505	11971	11971
query79	1474	1079	760	760
query80	1148	458	390	390
query81	498	285	247	247
query82	578	161	122	122
query83	355	281	249	249
query84	
query85	932	527	437	437
query86	414	305	275	275
query87	3498	3377	3230	3230
query88	3684	2812	2763	2763
query89	431	387	331	331
query90	1792	186	182	182
query91	179	162	138	138
query92	66	65	60	60
query93	1410	1583	886	886
query94	647	364	323	323
query95	687	384	454	384
query96	1076	831	342	342
query97	2710	2699	2566	2566
query98	215	208	204	204
query99	1155	1172	1068	1068
Total cold run time: 252693 ms
Total hot run time: 169669 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants